Round 1: Technical Screening (RDBMS, SQL, Data Modeling)
✅ Screening Test (RDBMS, SQL, Data Modeling)
- RDBMS Concepts
- SQL Questions
- Data Modeling
Round 2: Advanced Technical (Spark & SQL + Project Discussion)
✅ Project Deep Dive
- Walkthrough of a complex project involving Azure and Spark.
- Architecture-level discussion (data lake, ETL pipelines, orchestration, monitoring).
✅ SQL Coding
Write SQL queries using:
- Joins (inner, left, full)
- Window functions (ROW_NUMBER
, RANK
, LEAD
, LAG
)
✅ Spark Optimization Techniques (Theory)
- Partitioning, caching, and broadcast joins.
- Spark shuffle operations and how to avoid them.
- How to identify and handle data skew in Spark jobs.
Round 3: Hands-on Coding and Optimization
✅ Advanced SQL + PySpark
- Write a complex SQL query using multiple window functions and common table expressions (CTEs).
- Convert the same SQL logic to PySpark DataFrame code.
- Show use of withColumn
, window
, groupBy
, agg
, etc.
- Demonstrate how to handle missing/null values and schema evolution in Spark.
✅ Spark Basics & Optimization
- Spark execution plan (DAG), explain()
usage.
- Difference between narrow and wide transformations.
- Spark job stages and how to monitor them in Spark UI.
- Optimization techniques in practice:
- Use of persist/cache
- Coalesce vs Repartition
- Avoiding UDFs, choosing built-in functions
- Broadcast joins in skewed data scenarios
Round 4: HR
- Resume walkthrough and project highlights.
- Skills assessment based on past roles.
- Availability to join and preferred location.
- Work authorization and long-term career goals.